Skip to content

fix(backend): add dead worker failover monitor (#1769)#2487

Merged
mrveiss merged 1 commit intoDev_new_guifrom
fix/issue-1769
Mar 26, 2026
Merged

fix(backend): add dead worker failover monitor (#1769)#2487
mrveiss merged 1 commit intoDev_new_guifrom
fix/issue-1769

Conversation

@mrveiss
Copy link
Owner

@mrveiss mrveiss commented Mar 26, 2026

Closes #1769

Summary

  • Added failover_monitor coroutine to NPUWorkerManager
  • Detects dead workers via expired heartbeat TTL (npu:worker:{id}:status)
  • Re-queues orphaned tasks from running→pending (with retry tracking)
  • Tasks exceeding max_retries moved to failed queue
  • Launches as background task alongside existing health monitoring
  • All new functions under 30 lines (extracted helpers)

Test plan

  • File compiles without errors
  • All pre-commit hooks pass (Black, flake8, bandit)
  • Uses existing Redis patterns (no new client creation)
  • Respects max_retries limit
  • Logs all task migrations

@mrveiss mrveiss merged commit 3e951d5 into Dev_new_gui Mar 26, 2026
2 of 4 checks passed
@mrveiss mrveiss deleted the fix/issue-1769 branch March 26, 2026 19:28
@github-actions
Copy link

✅ SSOT Configuration Compliance: Passing

🎉 No hardcoded values detected that have SSOT config equivalents!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant